SeekSoulOnline Differential Enrichment Analysis Guide

Author: Ruifeng Gao

Time: 10 min

Words: 2.0k words

Updated: 2025-11-07

Reads: 0 times

SeekSoulOnline User Guide Differential Analysis Enrichment Analysis

Overview

What Is Differential Enrichment?

Differential Enrichment aims to identify genes that are significantly differentially expressed across cell populations or experimental conditions, and to determine whether those genes are enriched within particular functions or pathways. The workflow typically involves two steps:

Differential Expression Analysis: discover genes with significant changes.
Functional Enrichment Analysis: interpret the functional and pathway context of those genes.

NOTE

Differential Enrichment is a core step in single-cell interpretation. Configure groupings and thresholds with the underlying biology in mind.

Biological Significance

Differential Enrichment plays a vital role in single-cell studies because it helps you:

Characterize cellular functions: identify cell-type-specific marker genes via differential expression to understand functional features across cell populations.
Explore biological processes: reveal how biological processes shift under different states through enrichment analyses.
Investigate disease mechanisms: compare healthy and disease conditions to pinpoint potential disease-associated pathways.
Discover drug targets: find genes and pathways that respond to treatment, providing leads for therapeutic development.

Analytical Value

A Differential Enrichment workflow enables researchers to obtain:

Cell-type marker gene lists for annotation and functional characterization.
Knowledge of active biological pathways that illuminate cellular functional states.
Candidate regulatory mechanisms to guide follow-up experiments.
Biological hypotheses that support deeper functional validation.

Using the Module

Start an Analysis by Creating a Comparison Scheme

Click Create Comparison in the upper-right corner of the page to open the comparison dialog, then click Create Comparison again to create a scheme. The form includes three sections: Comparison Name, Category, and Group.

Comparison Name: enter a name that clearly reflects the comparison so it is easy to distinguish and retrieve.
Category: choose the objects you want to analyze—cluster labels, cell annotation labels, samples, or grouped sample labels (required). If no suitable label exists, click Group to create one; once saved, it appears in the dropdown.
Category: select the groups you want to compare—cluster or annotation labels, samples, or grouped sample labels (optional). If you configure more than five sets, click Expand to display them all (up to ten). The group before vs is the experimental (test) group, and the group after vs is the control; click vs to swap their order.

After defining the comparison scheme and configuring the Differential Analysis and Enrichment Analysis parameters, click Save to launch Differential Enrichment.

How to Interpret Comparison Name

If Category uses cluster or annotation labels and Group uses samples or grouped sample labels, you compare the same cell type between two samples.
If Category uses samples or grouped sample labels and Group uses clusters or annotation labels, you compare two cell types within the same sample.
Selecting only Category (and leaving Comparison Sets empty) performs one-vs-many comparisons, where each group is contrasted with all other groups.

Edit or Delete a Comparison Name

The scheme list records all configurations. Click View to modify a scheme or Delete to remove it. When finished, click the background or Close (upper right) to return to the main interface.

Differential Expression Analysis

Methods

References: presto documentation, Seurat FindMarkers

Software	Test	Core Principle	Advantages	Limitations	Recommended Scenarios
Presto	wilcox	Non-parametric Wilcoxon rank-sum (efficient implementation with auROC)	- Efficient for large cohorts - Memory friendly	- Requires downsampling for extreme sparsity/imbalance	Large datasets (≥1,000 cells/group); combine with downSample and maxCell
FindMarkers	wilcox	Non-parametric, compares medians and ranks	- Minimal distribution assumptions, robust	- Sensitive to severe imbalance/variance differences	Default option, broadly robust
FindMarkers	bimod	Log-normal likelihood-ratio test	- Matches log-normal expression patterns	- Sensitive to distribution assumptions	Near-continuous, log-normal expression
FindMarkers	roc	ROC/auROC separability analysis	- Intuitive separability metric	- No p-values; focuses on separability	Marker discovery and auxiliary evaluation
FindMarkers	t	Student’s t-test for mean differences	- Simple and fast	- Requires normality & equal variance; outlier sensitive	Approximate normal distributions, moderate sample size
FindMarkers	negbinom	Negative binomial test (counts, over-dispersion)	- Suitable for UMI counts/over-dispersed data	- Slower, parameter intensive	Raw counts with high dispersion
FindMarkers	poisson	Poisson test for sparse counts	- Straightforward sparse-count model	- Ignores over-dispersion	Sparse, Poisson-like data
FindMarkers	LR	Likelihood-ratio test (nested models)	- Supports covariates	- Requires careful modeling	When covariates must be modeled
FindMarkers	MAST	GLM accounting for zero inflation	- Tailored for zero-inflated single-cell data	- Computationally intensive	Heavy zero inflation and rigorous modeling

Parameter Notes

Min.pct: minimum fraction of cells expressing a gene (default 0.1, or 10%). Genes below this threshold in any group are excluded to speed up analysis.
Logfc.threshold: absolute log2FC cutoff (default 0.25). Only genes with log2FC ≥ 0.25 or ≤ -0.25 are retained. Raising the cutoff accelerates analysis but may omit genes; the value must be > 0.
Threshold Type(p_val / p_val_adj): threshold for declaring significance (default p_val_adj < 0.05). If too many genes pass, lower it (e.g., 0.01). Set to 1 to disable filtering. Valid range: [0, 1].
DownSample: toggles cell subsampling (enabled by default) to reduce computation time.
Maximum cell count per cluster: caps the number of cells per group. When DownSample is enabled, groups exceeding the cap are randomly subsampled; smaller groups use all cells.

Functional Enrichment Analysis

Methods

Reference: clusterProfiler documentation

Method	Function	Core Idea	Input	Output / Visualization	Recommended Use
ORA (over-representation analysis)	clusterProfiler::enricher	Hypergeometric/Fisher test for over-represented terms	Differential gene list + TERM2GENE (GO/KEGG/Reactome/MSigDB)	Enrichment tables; DotPlot, BarPlot	When you already have a differential gene set and want top terms
GSEA (gene set enrichment analysis)	clusterProfiler::GSEA	ES/NES with permutations to assess coordinated changes	Ranked gene list (e.g., by avg_log2FC) + TERM2GENE	NES, p.adjust; ES curves; Dot/Bar plots	When signals are subtle or you want to avoid hard thresholds

Databases and Datasets

The system supports databases for 100+ species (human, mouse, rat, zebrafish, fruit fly, worm, cat, dog, cow, chicken, horse, rhesus monkey, pig, platypus, opossum, green anole, Xenopus, yeast, chimpanzee, and more). See My Databases for details.
Choose databases and datasets according to species and study goals—for example, prefer GeneOntology for functional annotation and KEGG or Reactome for pathway analysis.

Database	Available Datasets
GeneOntology	Molecular Function, Biological Process, Cellular Components
Pathway	KEGG
MSigDB	H (Hallmark), C1 (positional), C2 (curated), C3 (regulatory targets), C4 (computational), C5 (ontology), C6 (oncogenic), C7 (immunologic), C8 (cell types), H_C2_C5 (combined library)
Reactome	Reactome pathway library

Parameter Notes

pvalueCutoff: p-value threshold for enrichment significance (commonly < 0.05).
qvalueCutoff: FDR threshold (commonly < 0.05; consider 0.01 for large datasets).
minGSSize: minimum gene set size (default 10) to filter very small sets and reduce false positives.
maxGSSize: maximum gene set size (default 500) to remove overly broad sets.

Interpreting the Results

Differential Expression Output

Top panel: the Group/Comparison selector is synchronized with the scheme dropdown. Click highlighted bars to jump to the corresponding details on the right.

Volcano Plot

Volcano plots summarize differential expression results:

X-axis: log2 fold change (log2FC)
Y-axis: −log10(p-value)
Colors:
- Red: significantly upregulated genes
- Blue: significantly downregulated genes
- Gray: non-significant genes
Threshold lines: vertical dashed lines mark log2FC cutoffs; the horizontal dashed line marks the significance cutoff.

Heatmap

Heatmaps display how differential genes are expressed across groups:

Rows: differentially expressed genes
Columns: cell groups
Colors: expression levels (red = high, blue = low)
Clustering: hierarchical clustering for genes and cell groups

Violin Plot

Violin plots illustrate expression distributions by group:

X-axis: cell groups
Y-axis: expression level
Shape: density of the distribution
Boxplot overlay: median and interquartile statistics

Bottom panel: aggregated volcano and bubble plots that switch with the Group/Comparison perspective.

View	Figures
Group
Comparison

Enrichment Output

Middle panel:

Left table: differential and enrichment results with search, filter, and sorting. Click Analysis Parameters to adjust methods or thresholds and rerun. Use Download to export differential or enrichment tables.
Right plots: enrichment DotPlot, BarPlot, and ES curve based on the selected database and pathway focus.

DotPlot

X-axis: GeneRatio
Y-axis: pathway name
Bubble size: number of enriched genes
Color: significance

BarPlot

X-axis: enrichment score or gene count
Y-axis: pathway name
Color: significance level
Sorting: by enrichment magnitude or significance

GSEA Enrichment Score Plot

X-axis: gene rank
Y-axis: enrichment score
Curve: enrichment score trajectory
Vertical bars: positions of genes in the set
Heatmap: gene expression levels

TIP

Differential Enrichment results can be further refined in the Plotting Tool for clearer interpretation and reporting.

Summary

The SeekSoulOnline Differential Enrichment module delivers an end-to-end workflow—from differential expression to functional enrichment—to help you interpret the biological significance of single-cell data. With well-chosen parameters and careful interpretation, you can obtain high-quality biological insights.

FAQ

Q1: How should I choose a differential analysis method?

Default to Wilcoxon (FindMarkers: wilcox) for a robust, assumption-light option.
Large cohorts (≥1,000 cells per group) or high performance needs: use Presto (efficient Wilcoxon/auROC).
Over-dispersed counts: consider negbinom.
Heavy zero inflation needing modeling: use MAST.
Approximate normal distributions with moderate sample sizes: a t-test works.
When covariates are required: select LR (likelihood ratio).

Q2: What if I obtain too many or too few differential genes?

Too many: raise the logFC threshold (e.g., 0.25 → 0.5), increase min.pct (0.05 → 0.1), tighten p_val_adj (0.05 → 0.01).
Too few: lower the logFC threshold, reduce min.pct, relax p_val_adj (0.05 → 0.1 or 1).
Slow performance on large datasets: enable downSample and set max.cells.per.ident (e.g., 3,000–10,000).

Q3: Should I use p-values or adjusted p-values?

Prefer adjusted p-values (p_val_adj ≤ 0.05); for multiple comparisons or large pathway sets, use stricter cutoffs (≤0.01).
For exploratory analysis where filtering is undesirable, temporarily set the threshold to 1.

Q4: When should I use ORA versus GSEA?

Use ORA (enricher) when you already have a differential gene set and want to focus on the most significant terms.
Use GSEA when signals are subtle or you want to avoid hard thresholds.
Database tips: use GO (BP/MF/CC) for functional annotation, KEGG/Reactome for pathways, or an MSigDB subset as needed.

Q5: What if the analysis runs slowly?

Switch to Presto, enable downSample, and cap max.cells.per.ident.
Narrow the database or pathway scope.
Check whether the scheme contains too many comparisons; split them into batches if necessary.

Q6: How do I troubleshoot unexpected results?

Verify that the Group/Comparison labels and directions are correct (the group before vs is the test group).
Check whether subsampling introduced variability; rerun with adjusted thresholds.
Consider batch or individual effects; adopt more robust methods or stricter thresholds if needed.

Q7: What if the enrichment database for my species is missing?

SeekSoulOnline ships with hundreds of species libraries. Ensure the species setting is correct; if the database is missing, upload one manually. See My Databases for instructions.

TIP

Align parameters with the biological question, revisit Analysis Parameters to rerun and compare results, and interpret the tables alongside the provided visualizations.

SeekSoulOnline Differential Enrichment Analysis Guide ​

Overview ​

What Is Differential Enrichment? ​

Biological Significance ​

Analytical Value ​

Using the Module ​

Start an Analysis by Creating a Comparison Scheme ​

How to Interpret Comparison Name ​

Edit or Delete a Comparison Name ​

Differential Expression Analysis ​

Methods ​

Parameter Notes ​

Functional Enrichment Analysis ​

Methods ​

Databases and Datasets ​

Parameter Notes ​

Interpreting the Results ​

Differential Expression Output ​

Volcano Plot ​

Heatmap ​

Violin Plot ​

Enrichment Output ​

DotPlot ​

BarPlot ​

GSEA Enrichment Score Plot ​

Summary ​

FAQ ​

Q1: How should I choose a differential analysis method? ​

Q2: What if I obtain too many or too few differential genes? ​

Q3: Should I use p-values or adjusted p-values? ​

Q4: When should I use ORA versus GSEA? ​

Q5: What if the analysis runs slowly? ​

Q6: How do I troubleshoot unexpected results? ​

Q7: What if the enrichment database for my species is missing? ​

References ​

SeekSoulOnline Differential Enrichment Analysis Guide

Overview

What Is Differential Enrichment?

Biological Significance

Analytical Value

Using the Module

Start an Analysis by Creating a Comparison Scheme

How to Interpret Comparison Name

Edit or Delete a Comparison Name

Differential Expression Analysis

Methods

Parameter Notes

Functional Enrichment Analysis

Methods

Databases and Datasets

Parameter Notes

Interpreting the Results

Differential Expression Output

Volcano Plot

Heatmap

Violin Plot

Enrichment Output

DotPlot

BarPlot

GSEA Enrichment Score Plot

Summary

FAQ

Q1: How should I choose a differential analysis method?

Q2: What if I obtain too many or too few differential genes?

Q3: Should I use p-values or adjusted p-values?

Q4: When should I use ORA versus GSEA?

Q5: What if the analysis runs slowly?

Q6: How do I troubleshoot unexpected results?

Q7: What if the enrichment database for my species is missing?

References